A complexity-effective microprocessor design with decoupled dispatch queues and prefetching
نویسندگان
چکیده
Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimately hinders attempts at increasing the clock speed. This is due to the fact that most of today’s designs are based upon a centralized dispatch queue which itself depends on globally broadcasting operations to wakeup and select the ready instructions. As an alternative to this conventional design, we propose the design of hierarchically distributed dispatch queues, based on the access/execute decoupled architectures. Simulation results based on 14 data intensive benchmarks show that while our DDQ (Decoupled Dispatch Queues) design achieves levels of performance which are comparable to what would be obtained in a superscalar machine with a large dispatch queue, our approach can be designed with small, distributed dispatch queues which consequently can be implemented with low hardware complexity and high clock rates. 2009 Elsevier B.V. All rights reserved.
منابع مشابه
Design and Effectiveness of Small-Sized Decoupled Dispatch Queues
Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues in modern superscalar microprocessors. However, such large queues are inevitably accompanied by high circuit complexity which correspondingly limits the pipeline clock rates. This is due to the fact that most of today’s designs are based upon a centralized dispatch queue which depends on glo...
متن کاملTopic 7: Parallel Computer Architecture and Instruction Level Parallelism
We welcome you to the two Parallel Computer Architecture and Instruction Level Parallelism sessions of Euro-Par 2006 conference being held in Dresden, Germany. The call for papers for this Euro-Par topic area sought papers on all hardware/software aspects of parallel computer architecture, processor architecture and microarchitecture. This year 12 papers were submitted to this topic area. Among...
متن کاملA low-complexity microprocessor design with speculative pre-execution
Current superscalar architectures strongly depend on an instruction issue queue to achieve multiple instruction issue and out-of-order execution. However, the issue queue requires a centralized structure and mainly causes globally broadcasting operations to wakeup and select the instructions. Therefore, a large issue queue ultimately results in a low clock rate along with a high circuit complex...
متن کاملA Performance-Correctness Explicitly-Decoupled Architecture: Technical Report
Optimizing the common case has been an adage in decades of processor design practices. However, as the system complexity and optimization techniques’ sophistication have increased substantially, maintaining correctness under all situations, however unlikely, is contributing to the necessity of extra conservatism in all layers of the system design. The mounting process, voltage, and temperature ...
متن کاملPA-8000 Combines Complexity and Speed: 11/14/94
Long a proponent of simple, fast processors, HP has succumbed to the siren call of complexity, creating the most feature-filled RISC design yet revealed. Steve Manglesdorf, presenting at last month’s Microprocessor Forum, said that the forthcoming PA-8000 will achieve high clock rates despite the burden of this feature set, a powerful combination that he claims will create the industry’s fastes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 35 شماره
صفحات -
تاریخ انتشار 2009